Quote Extraction and Attribution from Norwegian Newspapers

نویسندگان

  • Andrew Salway
  • Paul Meurer
  • Knut Hofland
  • Øystein Reigem
چکیده

We present ongoing work that, for the first time, seeks to extract and attribute politicians’ quotations from Norwegian Bokmål newspapers. Our method – using a statistical dependency parser, a few regular expressions and a look-up table – gives modest recall (a best of .570) but very high precision (.978) and attribution accuracy (.987) for a restricted set of speaker names. We suggest that this is already sufficient to support some kinds of important social science research, but also identify ways in which performance could be improved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ord i Dag: Mining Norwegian Daily Newswire

We present Ord i Dag, a new service that displays today's most important keywords. These are extracted fully automatically from Norwegian online newspapers. Describing the complete process, we provide an entirely disclosed method for media monitoring and news summarization. For keyword extraction, a reference corpus serves as background about average language use, which is contrasted with the c...

متن کامل

A Sequence Labelling Approach to Quote Attribution

Quote extraction and attribution is the task of automatically extracting quotes from text and attributing each quote to its correct speaker. The present state-of-the-art system uses gold standard information from previous decisions in its features, which, when removed, results in a large drop in performance. We treat the problem as a sequence labelling task, which allows us to incorporate seque...

متن کامل

Automatically Detecting and Attributing Indirect Quotations

Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation ex...

متن کامل

A Study of Information Extraction Tools for Online English Newspapers (PDF): Comparative Analysis

Information retrieval is the task of retrieving relevant and useful information from e-newspapers. Electronic newspapers are electronic replicas of traditional newspapers. E-newspapers are becoming increasingly popular because of the ease and convenience in accessing them. Newspapers are the source of timely information. These are the documents comprising news items and several independent info...

متن کامل

Examining the Impact of Coreference Resolution on Quote Attribution

Quote attribution is the task of identifying the speaker of each quote within a document. While recent research has established large-scale corpora for this task, these corpora are not yet consistent in the way they handle candidate speakers, and many of the reported results rely on gold standard annotations of both entities and coreference chains. In this work we evaluate three quote attributi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017